Exemplar-based Visualization of Large Document Corpus
نویسندگان
چکیده
With the rapid growth of the World Wide Web and electronic information services, text corpus is becoming available online at an incredible rate. By displaying text data in a logical layout (e.g., color graphs), text visualization presents a direct way to observe the documents as well as understand the relationship between them. In this paper, we propose a novel technique, Exemplarbased Visualization (EV), to visualize an extremely large text corpus. Capitalizing on recent advances in matrix approximation and decomposition, EV presents a probabilistic multidimensional projection model in the low-rank text subspace with a sound objective function. The probability of each document proportion to the topics is obtained through iterative optimization and embedded to a low dimensional space using parameter embedding. By selecting the representative exemplars, we obtain a compact approximation of the data. This makes the visualization highly efficient and flexible. In addition, the selected exemplars neatly summarize the entire data set and greatly reduce the cognitive overload in the visualization, leading to an easier interpretation of large text corpus. Empirically, we demonstrate the superior performance of EV through extensive experiments performed on the publicly available text data sets.
منابع مشابه
دیداری کردن نتایج جستوجو در فرایند بازیابی اطلاعات
Purpose: One of the most effective ways to achieve optimum information retrieval is through visualization of Information. Search strategies, probing skills, querying of information needs and analysis of information play a significant role in the accessing of necessary and useful information. Besides the factors mentioned above, information visualization can increase the availability level of in...
متن کاملDoris: A Tool for Interactive Exploration of Historic Corpora
Insights into social phenomenon can be gleaned from trends and patterns in corpora of documents associated with that phenomenon. Recent years have witnessed the use of computational techniques, mostly based on keywords, to analyze large corpora for these purposes. In this paper, we extend these techniques to incorporate semantic features. We introduce Doris, an interactive exploration tool that...
متن کاملThe Shape of Shakespeare: Visualizing Text using Implicit Surfaces
Information visualization focuses on the use of visual means for exploring non-visual information. While free-form text is a rich, common source of information, visualization of text is a challenging problem since text is inherently non-spatial. This paper explores the use of implicit surface models for visualizing text. We describe several techniques for text visualization that aid in understa...
متن کاملImproving Exemplar-based Image Completion methods using Selecting the Optimal Patch
Image completion is one of the subjects in image and video processing which deals with restoration of and filling in damaged regions of images using correct regions. Exemplar-based image completion methods give more pleasant results than pixel-based approaches. In this paper, a new algorithm is proposed to find the most suitable patch in order to fill in the damaged parts. This patch selection ...
متن کاملAutomatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009